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Abstract:- In this paper, we develop a methodology for designing lower-error and Area efficient 2's- 
complement fixed-width multiplier. In these multipliers basic multiplications follow the Baugh-Wooley 
algorithms and have been implemented using Field Programmable Gate Array (FPGA) devices. The approach is 
based on the fact that the multiplication operations used in multimedia applications (such as DSP) usually have 
the special fixed-width property i.e., their input data and output product have the same bit width. For some 
practical DSP applications, we only require n-bit multiplication output, which is to be obtained by directly 
truncating the n least-significant bits and preserving the n most significant bits. However, significant errors are 
introduced in the fixed-width operation, which are undesirable for DSP applications By properly choosing the 
generalized index and binary thresholding, we derive a better error-compensation bias to reduce the truncation 
error. The proposed fixed width low error multiplier shows better error performance as compared to other 
existing multiplier structures. 
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I. INTRODUCTION 

Multiplication is an important operation in many algorithms used in scientific computations such as 
Digital Signal Processing (DSP). The computational complexities of algorithms used in Digital Signal 
Processors (DSPs) have gradually increased over the years. Therefore, DSP's require fast and efficient parallel 
multipliers for general purpose as well as application specific architectures. In these multipliers the basic 
multiplication follows the Baugh Wooley multiplier. The multipliers based on the Baugh-Wooley algorithm 
produce 2n-bit output with n-bit multiplier and n-bit multiplicand input. The DSP applications require extensive 
use of multiplication and squaring functions. A full width digital n x n multiplier computes the 2n output as a 
weighted sum of partial products. If the product is truncated to n-bits, the least-significant columns of the 
product matrix contribute little to the final result. To take advantage of this, truncated multipliers do not form all 
of the least-significant columns in the partial-product matrix. By eliminating more columns the area and power 
consumption of the arithmetic unit are significantly reduced and the delay also decreases. For some practical 
applications, we only require n-bit multiplication output, which is to be obtained by directly truncating the n 
least-significant bits and preserving the n most significant bits. However, significant errors are introduced in the 
fixed-width operation, which are undesirable for DSP applications [1]. 

To reduce the introduced truncation error, [2] proposed an analytical technique to generate a correction 
term. The main drawback of this design is the correction bias added to offset. The error due to truncation is a 
constant term and does not depend on the inputs being fed to the multiplier. [3] Proposed the fixed multiplier 
with a constant correction technique, which introduces a degree of flexibility in the number of columns that are 
truncated. This gives designers a chance to choose between area savings and better error correction. However, 
there exist two problems 1) how to choose proper indices. 2) Whether other lower error multipliers exist or not. 
The work in this paper proposes the general methodology for designing the lower error 2' s -complement fixed- 
width multiplier with w > 1 . 

The rest of the paper is organized as follows: section 2 discusses the Baugh Wooley multiplier, section 
3 gives details of the proposed algorithm, section 4 presents the results and section 5 provides the conclusion 
and references are listed in the end. 

II. BAUGH WOOLEY MULTIPLICATION 

The Baugh wooley multiplication algorithm is an efficient way to handle the sign bits. This technique 
has been developed to design regular multipliers, suited for 2's compliment mumbers.[l] 

LET US CONSIDER Considering 2's complement integer operands, a n-bit multiplicand X and a n-bit multiplier Y 
can, respectively be represented by 
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n-2 



X=-x n ,2"-' + Yx2' 

i=0 

Y = -y n _ l 2^ + f j y i 2> 



(1) 



(2) 



i=0 



Where x i ,y i e {0,1} 

The standard product Pstandard can be written as 



= = x i:i y i: _2 2n " +ZZ X ,)', 2 " J + 2 "" ( 2 ' X ( v 2 ' 1 ) ' 2 2 ' X - v ( 2 ' 1 

(3) 

(A) Fixed Width Multiplication 

The multiplication based on the Baugh wooley algorithm produce 2n bit output with n bit multiplier 
and n bit multiplicand input. However in DSP applications only n bit multiplication output is needed. Therefore 
the fixed width multiplier is obtained by truncating the least significant partial products. And preserving the 
most significant partial products as shown in figure 1. 




Fig.l: Partial product array diagram for an nxn Baugh- Wooley multiplier. 
The most accurate fixed width product is theoretically given as 



P„. , ,=MP + a x2 
S tan dara temp 



n 



Where 



temp 



LP 



\( X n-Xy* +- + Wn-1 ) + jt( X n-2y + - + ^0^-2 ) + - + (^O + Vl) + j» {Wo ) 



(4) 



(5) 



Where a temp an ideal error-compensation term called true rounding approach and it is infeasible to 
implement the truncated fixed-width multiplier without using any acceptable approximation. From equation (5), 
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it is observed that a temp is mainly affected by 1 , 



+ - + 



due to the largest weight. Now, let us 



assume the main error compensation term E main and remaining error compensation term E remain [2, 3]. 
Therefore, 

E mam =\{ X n- i y + ^-2^1 + - + W-l ) 
^remain = ^"(^-Z^O + - + Vn-z) + - +^2 (^O + - + Vl) + ^(¥o) 



(6) 



And 



The equation (2) can be rewritten as 



temp 



(E . +E ) 

V main remain / 



(8) 



(7) 



Note that CT temp varies as the input bits x, 's or y, 's alternates. 

Next we first define a generalized index, where w means to keep n+w most significant columns of the 

sub-product array as shown in Figure 1 [4], and the binary parameters (c[ n _i_ w -,£l„_2-w->----> ( lo ) e \P> l} • 



& index, W ?„-2-w> ->%) = [ X «+« ^0 J""'" + ( X «+» ^0 ) + - + (^n-l-w )"""" 



(9) 



The operator 



Wi)* =x,yj if 

if 



(10) 

To introduce the generalized index into the error compensation bias equation we rewrite equation 8 as 

1 



q,=0 
q,=l 



temp y -' 



(E . +E )-6 n 

\ main remain / Q,w 



(ID 



where index 



= (ln-l- W x2 <ln-2- w x2 X...X % 2 j 



Where Q has a range varying from0to2 nl +l. 

III. PROPOSED FIXED WIDTH MULTIPLIERS WITH W > 1 

The lower truncation error can be obtained if larger most significant columns are kept in hardware, 
however at the cost of area. Equation (8) can be rewritten as 



lemp 



Where 



x „ „ ,.v,)* ' +... + (x l y n _ w _ 2 y +y 



^=<w.yoy wi +<^o^-ir + 



l -E + l -E 

^ mam ^ r ' 



9. 



Q,w 



(12) 



(13) 



In equation (12), the first term in the bracket is referred to as coarse-adjustment term and the second term [K] is 
referred to as fine-adjustment term. The coarse adjustment term can be easily realized by a simple circuit using 
AND, OR logic, while the index is decided. On the other hand, the value of the fine-adjustment term can be 
obtained by the expected value in rounding operation after analyzing the statistics [5]. 

For designing simple and realizable error-compensation circuit, we define two types of binary thresholds for 
bias estimation. Both types of binary thresholding of ^indfj are described as follows: 

U 
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Typel 



tempQ,w>l 



\k 1 

Owi)*"' +-+(^,.-2)* + L ^ i '/%»=° 



\9„-2 



if0 Q , w >O 



(14) 



Type 2 



tempQ,w=\ 



ln-w-2 



* , [*a] 



= ^n-w-2^1 



+ - + W n - 2 ; 



if 0& w =n 



(15) 



Where AT 7 , K 2 , K 3 , K 4 are average values of for satisfying 9 index = 0, d index >0, 6 index >n, 6 index = n respectively. 

The restriction on the value of K can be limited as ] e jO, 1, 2 11 1 — 1, 2 W 1 j for 2' = 1,2, 3 and 4. [6, 7]. 

By doing simulations we obtained the values K at different generalized indexes as shown in figure 2. 
we simulate the value of K for smaller word length while for large word lengths the value of K is determined 
by statistical analysis because at large word lengths we are not able to simulate the value of K due to high 
computational load. 




5 10 15 20 25 30 35 

All values of index Q in type 2 binary thresholding n=G 
Fig. 2: Values of K versus different values Q of the Binary thresholding for n=6 



(A )Statistical Analysis: 

For bias estimation , we assume Type 2 binary thresholding,, which is defined in equation (15). After 
analyzing equation (15), two cases can be taken into consideration. 



CASE1 



From (13), we have 
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The generalized index is 
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CASE 2 

2" '+1 



This condition is met when 



x n- 2 yi + x iy n -2 =1 

and 

(^-2^1 =- = ^1^-2 =1 ) 

Also 



1 



1 
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Therefore, 

[K 4 ] = [E{K}] = 



E< x ,v, + x, v ~ — £" • H — E 

y*n-2Jl n-2 ^ main ^ remain 



= 



(17) 



Thus from equations (16), (17) and (15), we can derive a new error compensation bias as. 



(18) 
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Therefore, this constant approximation for can be mapped to the structure as shown in figure 3(a) 
for n=8 [8], where A, ND, HA, and FA denote AND gate, NAND gate, a half adder and a full adder, 
respectively. The logic diagrams of AOR, ANOR, AHA, AFA, and NFA is shown in figure 3(b) 
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y i 
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(b) 



Fig. 3: (a) Proposed low-error fixed-width 8x8 Baugh Wooley multiplier with &q_ 2 *~i +1 _j > ar, d 00 Logical 

Elements. 



IV. RESULTS 

The figure 4 shows the comparison between Booth and Baugh Wooley multiplication technique in 
terms of delay and we conclude that Baugh Wooley algorithm is the efficient one. The tablel shows the error 
performance of different fixed width multipliers. The Table 2 gives the comparison of standard and fixed width 
multiplier in terms of number of occupied slices and delay. 
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Fig. 4: Comparison between Booth and Baugh Wooley algorithms in terms of delay. 
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Table 1: Comparison results of error among different fixed width Baugh Wooley multipliers. 



Multiplier 


Width 


Max 
Error 


Mean Square 
Error 


Fixed width 
multiplier w = 


N = 8 


-0.1156 


3.3061 x 10- 
5 


Fixed width 
multiplier w = 1 


N = 8 


-0.0078 


8.2652 x 10- 
6 



Table 2: Comparison results of Area and delay among different Baugh Wooley multipliers n=8 



Multiplier 


No. of occupied LUT 
slices 


Delay(ns) 


No. of 
IOB 


Standard 


86 


10.878 


32 


Fixed Width 


54 


8.06 


24 



IV. CONCLUSIONS 

By properly choosing the generalized index and binary thresholding, we derive a better error- 
compensation bias to reduce the truncation error and then construct a lower error fixed-width multiplier, which 
is area-efficient for VLSI realization. Moreover, a number of low error fixed width multipliers are generated, the 
only constraint is to choose the right value of the index which would need exhaustive search. 
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